home *** CD-ROM | disk | FTP | other *** search
-
- Document Identifiers
- or
- International Standard Book Numbers for the Electronic Age
-
-
- Brewster Kahle (Brewster@think.com)
- Thinking Machines Corporation
- September 1991
- Version 2.2
-
-
- An electronic document identifier would allow computers to refer to
- documents created and maintained on other computers. This system is
- designed to allow distributed Hypertext systems as well as large electronic
- publishing structures.
-
- The doc-id is made up of three fields: a NAME, an ADDRESS, and a
- REDISTRIBUTION-DISPOSITION. Usually the name and address are the same, so
- they do not need to be repeated, and the redistribution-disposition is not
- required if redistribution is permitted.
-
- This document is a proposal for a semantics and syntax for a
- doc-id structure.
-
- First, a few examples:
- rfp-882:rfp@think.com means a document was created on the rfp
- database on think.com has a document called
- rfp-882.
- /pub/rfp-822@think.com means that an FTP name for a file
- "/pub/rfp-822" from think.com.
- This name might or might not be valid as an
- address.
- (rfp-822:rfp@nic.org, rfp-822:rfp-redist@think.com, f)
- means the original source of the document
- is on the nic.org host, but it is
- redistributed from think.com and it
- free to do further redistribution.
- The redistributor (rfp-redist@think.com)
- is an address that can be used to retrieve
- the document, but that service in not
- necessarily on think.com.
- b0-100:rfp-882:rfp@think.com Bytes 0-100 of this document.
- rfp-822:rfp-redist@think.com:z3950
- is a name of a document, it might work
- as an address, and if it does, the z3950
- service is used to access the document
-
- As can be seen from the examples there are several levels of short
- forms for the full doc-id form:
- "(original_local_doc_id:original_database@original_hostname:tcp-port,
- redistributor_local_doc_id:redistributor_database@redistributor_hostname:tcp-port,
- copyright_disposition)"
- A further restriction can be attached before the local_doc_id to indicate
- a part of a document.
-
- To use these identifiers to locate and retrieve the document, a directory
- service might is necessary.
-
- These identifiers can also be described by the types of uses they are
- expected to perform:
- 1. How to get a copy of this document from a WAIS type server?
- 2. From an FTP type server?
- 3. Is redistribution of a copy of the document allowed a copy
- of it within copyright law?
- 4. Do two document identifiers correspond to the same document?
- 5. How should a reference to a remote document be formatted in
- another document (Hypertext pointer)?
- 6. How do you refer to a piece of a remote document?
- These operations can be performed with this structure.
-
- The separate fields of a document identifier are defined as:
-
- ORIGINAL and REDISTRIBUTOR fields are another way of saying NAME and
- ADDRESS respectively. The original ID (name) is only useful for recognizing
- copies found through different paths. If a document has not be
- redistributed, then the original and redistributor are the same and one can
- be eliminated. The redistributor ID (address) can be used to find the
- document. See the section below on accessing documents using doc-id's.
-
- HOSTNAME (both original and redistributor) (eg. think.com or
- 12234343.isdn) is used as a unique name in some defined namespace. The
- function as an address is secondary to that of using it as a unique name to
- distinguish naming authorities. The syntax is an email style such as a
- Domain Name System name. The hostname can be augmented with a tcp port
- number or service if the default port is inappropriate for whichever
- service is suggested (Z39.50 or FTP for instance). The syntax would be a
- hostname:port (eg. think.com:21 or think.com:ftp). How to contact that
- hostname might not be fully specified by just giving its name (for instance
- if a login procedure is necessary, or the hostname is not a network name)
- then the database name and hostname can be used to query the directory of
- servers to find more information on that database.
-
- DATABASE (both original and redistributor) (eg. rfc) is a name of a
- database on a server that can be accessed via WAIS protocol (Z39.50). This
- name could be put in the database field. These names are picked by the
- server.
-
- LOCAL_DOC_ID (both original and redistributor) (eg. rfc-882) is a
- string that is created by a database that is an opaque object that is
- used by clients to ask for that document from the database sometime
- in the future. If the database decides to delete the underlying
- document, then the client is out of luck.
-
- DOCUMENT SECTION is a string that indicates what part of a document is
- being referred to (eg. b0-1000 is bytes [0 1000)). If it is not present,
- then the whole document is assumed. This syntax allows for different types
- of sections to be used, such as "l" for line, "p" for page, "c" for
- chapter, "f" for frame, etc. Only "b" and "l" are specified at this time.
-
- COPYRIGHT_DISPOSITION is a field that describes the redistribution
- rights of the document it points to. This values of field are not fully
- specified. Known value:
- f Free to redistribute
- r Restricted, so the doc-id can be redistributed, but the
- receiver must get the copy from the redistributor itself.
- If it is not specified, then f is assumed.
-
- The goals are for the doc-id to be:
- 1) easy to create unique IDs for documents (without a central authority),
- 2) possible to retrieve the document using the ID in many cases
- (serve as an address),
- 3) allow users of the IDs to know the copyright intent of the publisher,
- 4) allow users to know when they have two references to the same document,
- 5) and be terse.
-
- Syntax:
- The syntax of the fields follows the email standards. Thus, if a space
- is inbedded in a database name then it can be put in quotation marks (").
- (NOTE: I dont know how email handles non-ascii. I would be be in favor of
- using lisp's printing rules so "\125" would be the character 125 in ascii.
- The reason for Lisp is that it is well defined).
-
-
- Common operations on doc-ids:
-
- Do two doc-id's refer to the same document? Compare the original
- doc-id field to see if they are the same. If there is only one field (a
- shorthand) then that is the original doc-id.
-
- How to I retrieve a document from a doc-id? The full answer is to
- ask the directory of server the question
- "redistributor_database@redistributor_hostname:tcp-port" and it will return
- a source structure (see another specification for this structure) which
- contains contact information. In many cases, however, the
- redistributor_hostname and tcp-port can be used to contact the host
- directly to make the retrieval request. To know if it is an FTP or WAIS
- service, there are two indications:
- If the tcp service is specified, use it.
- If there is no redistributor_local_doc_id then it is an FTP doc-id.
- (eg. /pub/foo@think.com as opposed to rfc-23:foo@think.com)
- If the databasename starts with a "~" or "/" then it is a FTP
- filename. (this method is not preferred)
- Otherwise it is a WAIS doc-id.
-
- How do I reference about a database (not a full doc-id)? Just use
- the database@hostname:tcp-port or database@hostname form. This can be a
- handy shorthand in user interfaces.
-
- When can I used shortened forms? If there is redundant
- information, eliminate it. If the hostname is localhost, for instance, it
- can be eliminated, making an ftp-able file from a local machine be just the
- pathname. In a reference in a paper, just use the original doc-id.
-
-
- OPEN ISSUES: Are any of the fields case sensitive? What is the syntax for
- a phonenumber as hostname (eg foo@0116175551212.phone)? How are non-ascii
- characters quoted in these strings (eg \234 for the decimal byte 234)?
-